Performance paradox

The performance paradox is a theory set forth by Marshall W. Meyer and Vipin Gupta in 1994, which posits that organizations are able to maintain control by not knowing what exactly performance is.^[1]^: 309 This theory is based on several facts of performance, namely that the number and type of performance measurements that exist are increasing at a rapid rate and that these new metrics tend to be weakly correlated with old ones.^[1]^: 309

Performance appraisals

In order to understand the performance paradox, it is helpful to first have a basic understanding of performance appraisals.

Performance appraisals, also known as performance evaluations, are assessments that many organizations use to measure individuals' productivity, ability and talent in their respective job positions.^[2] The goal of these appraisals is not only to measure each person's performance, but also to align all of the employee's values, goals and motivations and become a better performing organization as a whole. While the implementation of performance evaluations has been characterized as beneficial and even essential for organizational success, many of these performance evaluations have also become more ineffective over time due to both the excessive number of evaluation measures and employee reactivity to these evaluations.

Reasons

Performance appraisals present two strong benefits for organizations. First, reviewing an employee's performance in his or her responsibilities helps employers keep a consistent record of professional developments and recognize ways to improve an employee's productivity. Second, appraisals enable the relationships between managers, supervisors and their employees, to be based on open communication and consistent constructive criticism. As a result, many managers have emphasized the value of using different performance measurement systems based on either financial and operational measures of performance.^[3]

Measures

Performance evaluations have been based on various operational or financial measures of performance, but no one factor provides a clear indication of productive or ineffective performance.^[3] The response has been to focus on too many measures on which to base performance assessment. Some of the various perspectives that are often considered when measuring evaluation are customers' perspectives, internal business perspectives, innovation perspective and financial perspectives.

Since many organizations depend on customers for profit, companies primarily evaluate employees based on their performance with customers. These customer reviews are then used to shape how companies function internally, directing what kinds of goals employees should have to achieve the company's overall mission. Then, organizations can assess performance based on the products that employees create. Finally, financial performance measures should be a focus to identify how employee achievements contribute to the business' profitability.

These four measures extract important information about employees using performance evaluations. However, a rising market in the design of performance assessment has led to a system overload of even more evaluation measures. Having so many measures of evaluation and consequently, multiple grading scales for assessment, has often led to incompetent performance appraisals.

Reactivity

Performance appraisals also become ineffective because of employee reactivity to the evaluations themselves. The concept of reactivity explains that evaluations meant to assess performance are often rendered futile because they affect employee performances.^[2] In other words, many performance appraisals do not accurately measure performance because employees react to being observed and evaluated. Because they can only test an employee's "test-taking" skills, not his or her objective accomplishments, evaluations appear unable to serve their purposes. Critics question the value and enduring existence of performance evaluations that lead employees to react and alter their behavior.

Efficacy

Excessive measures of performance and constant reactivity to performance appraisals challenge the very purpose of the system of performance appraisals, but performance evaluations can still offer organizations feedback about their practices. If performance evaluations did not accurately measure individual's abilities and productivity, it would be fruitless to continue to execute a system of constant performance evaluations. However, a phenomenon known as the performance paradox seems to suggest that performance evaluations may not be altogether futile.

Characteristics of performance

The theory of performance paradox is grounded in three characteristics of performance measurement. Firstly, there are many performance metrics, and the number continues to grow.^[1]^: 317 Secondly, most measures of performance, even those that are used most frequently, exhibit little to no correlation with one another.^[1]^: 319 And thirdly, the dominant performance measures at any given point in time change continuously.^[1]^: 322

Multiple measures and growth of the performance measurement industry

Individuals and organizations have designed numerous ways to measure performance and continue to do so at an increasing rate.^[1]^: 317 Evidence for this observation can be found by tracking changes in performance measures over time. In the 19th century, companies measured their performance via industry-specific output and cost measures, such as newspaper circulation.^[1]^{: 317–318} By the 1920s, companies also began utilizing accounting-based returns measures, such as return on investment.^[1]^: 318 And the 1960s and 1970s saw the emergence of purely financial measures of performance relating information on dividends and return on equity.^[1]^: 318 This dependence on performance measures has not diminished in recent years – on the contrary, the number of metrics that exist is growing at an even more accelerated rate.^[1]^: 318 Today, in addition to financial measures, organizations examine nonfinancial metrics regarding leadership, information, planning, human resource utilization, and customer satisfaction.^[1]^: 318 This proliferation in performance measurements has led to corresponding growth in the performance measurement industry – there has been a notable increase in the number of personnel and organizations devoted to examining performance-related metrics, such as certified public accountants and financial analysts.^[1]^: 318

Null correlations

Both individuals and organizations disagree about how best to define and measure performance.^[1]^: 319 As a result, many performance measures, even those that are most commonly used, tend to show little to no correlation with one another.^[1]^: 319 A multitude of studies has found that accounting and financial performance measures do not correspond closely, and that measures of reputational performance indicators do not correspond with accounting and financial performance measures.^[1]^: 321 That these measurements have such weak relationships with one another makes it difficult to evaluate the overall performance of a company, as an organization could be deemed a success according to one metric and a failure according to another

Change in dominant measures

The dominant performance measures change over time.^[1]^: 322 A compilation of surveys examining the stated financial goals of companies over the century found the following: companies strove to maximize market share in the late 1960s; earnings per share in the mid-1970s; return on equity in the early 1980s; and cash flow and share prices in present day.^[1]^: 322 Other surveys found that organizations' preferences for evaluating performance related to capital expenditures have shifted dramatically over time.^[1]^: 323 For instance, in 1959, 13% of firms focused on internal rates of return while 86% of firms did so in 1988.^[1]^: 323 In addition, the percentage of firms utilizing accounting measures decreased from 50% in 1959 to 12% in 1988.^[1]^: 323 One potential reason that dominating performance measures change over time is that as organizations replace old measures when they discover their limitations.^[1]^: 323

These three facts about performance form the foundation of the performance paradox. Accounting for how these facts come about is the next step in explaining this theory.

Explaining the facts of performance

Performance determinants

Before considering the factors that motivate the performance paradox, it is important to point out that characteristics known as comparability and variability differentiate good performance measures from bad ones.^[1]^: 310 Comparability is defined as the potential to utilize a performance measure across different settings to horizontally compare performance.^[1]^: 310 Variability is essential because it ensures that evaluations can be recorded on an outcome scale that is extended enough to vertically compare different levels of performance.^[1]^: 310 Within organizations, these two properties are impacted by the use of several weakly correlated performance measures through a mechanism termed the running down process.^[1]^: 310

The running down process

The running down process refers to the fact that the comparability and variability of performance measures "erode over time", prompting the perpetual need for new performance measures in the same setting.^[1]^: 324 Meyer and Gupta connect five key factors to the running down process, including positive learning, perverse learning, selection, suppression, and external conditions.

The phenomenon of positive learning accounts for the fact that over time, the existence of specific performance measures can contribute to the improved performance of individuals, leading to a general decrease in the variability of results and thus less effective performance measures.^[1]^: 331 For example, in baseball, the diminished variability in batting averages in the 20th century is attributed to the improvement of players over time, but it has had the effect of devaluing batting averages as an effective performance measure in the industry.^[1]^: 338

Conversely, perverse learning results in stagnating performance levels within an organization because it leads individuals to focus on improving their outcome in performance measures, rather than their actual performance.^[1]^: 339 For example, teachers may often dedicate their efforts towards improving their students' test scores rather than their teaching style.^[1]^: 339 Similar to positive learning, perverse learning leads to a decreased variability in measured performance levels, but this performance improvement is artificial.^[1]^: 339

Selection explains that performance measures decrease in variability within organizations because individuals learn to select better individuals to evaluate.^[1]^: 340 For example, as the major league farm system has developed, teams have learned to select better batters and pitchers, contributing to the decrease in the variability of batting averages.^[1]^: 340

Suppression is explained by the fact that "organizations sometimes suppress persistent differences in performance".^[1]^: 341 For example, within the New York City school district, standardized testing scores vary greatly, and some administrators have advocated for a different reporting system that would make it much more difficult to differentiate performance levels between schools.^[1]^: 341

Finally, external factors can impact performance measures in the opposite direction of the running down process. For example, the turbulence of the commercial banking system in recent decades has served to disrupt the running down process of existing performance measures because the unpredictability of the industry makes it difficult for individuals to "learn" or "select" based on past factors and experiences.^[1]^: 343

Given that performance measures tend to erode over time, Meyer and Gupta call for new performance measures that evaluate the same properties but are not yet impacted by the running down process.^[1]^: 311 Ultimately, Meyer and Gupta state "The running down of existing measures and the appearance of new measures nearly orthogonal to existing ones yields a paradox of performance."^[1]^: 311

Orthogonal measures

When performance appraisal measures are run down, they typically need to be replaced by new measures. In the sciences, overlapping data is useful, in that it they can be used to confirm or disprove a given hypothesis. In management, however, overlapping measurements are considered redundant, rather than a useful indication of reliability.^[1]^: 346 By the same token, new measures that lie in direct opposition to existing measures of performance are not helpful. For instance, if a retail company uses units of shoes sold in a month as a metric, adding units of shoes remaining unsold after a month as a new metric is not helpful. Since the company can derive the same information and draw the same conclusions from both metrics, it is more efficient to use only one of the two measures. In the interest of generating useful data, new performance measures should be orthogonal to existing metrics.

Orthogonality, or non-redundancy, does not necessarily indicate null correlation. Consider a secretary's performance, which might be measured by number of breaks per hour and the time required to complete reports. The two measures are orthogonal because they do not overlap. However, it is possible that repeated evaluations could show a reliable association between a higher number of breaks per hour and less time required to complete reports.

The history of General Electric provides a clear example of developing orthogonal performance measures. When GE dismantled its conglomerate in the 1950s, its existing performance measures, which relied on centralized budgetary targets, needed to evolve to suit the newly decentralized company. The 1951 GE Measurement Project provided a template for the new performance measures, which were orthogonal to the old performance measures, as well as to each other. The new measures were "profitability, market position, productivity, product leadership, personnel development, employee attitudes, public responsibility", and balance between short-term and long-term goals.^[1]^: 348 Thirty years later, when the company was in dire straits, the performance measurements were functionally consolidated into ranked profitability and growth. With this strategy, GE annually swept away the bottom 10% of its performers in profitability and growth. Once GE regained financial and market stability, the performance evaluation metrics changed in response, allegedly expanding and taking on more humanistic values. GE illustrates two important notes about changing performance metrics. First, new performance measures are most useful when they are unrelated to each other and to existing measures. Second, performance measures tend towards elaboration during times of security and profitability, and likewise tend towards consolidation during times of urgency and strain.^[1]^: 348–50

Orthogonality has been shown in the history of many industries, particularly to reflect changing expectations. American hospitals used to measure success by patient outcome. In the early 1900s, however, a study showed such dismal results with patient outcomes that the study and its results were burned, and hospitals instead evaluated performance by the keeping of records and adherence to procedures. With time, societal expectations of low patient mortality have led to hospitals reinstating the patient outcomes as a measure of success.^[1]^: 345

Evolving technology have also forced the development of orthogonal measures. As early as 855, the success of texts was measured by print runs.^[4] In 1942, The New York Times began publishing a list of best-selling books, which has been shown to influence the purchases of the majority of American book buyers.^[5] The NYT Bestseller list is divided into section, including fiction, non-fiction, and children's literature. With the advent of e-book technology, the NYT added an orthogonal e-book section to the list.

Comparison to existing models of performance

Several other performance evaluation models in the academic literature also describe performance measurement. What distinguishes performance paradox from these other models is that they tend to restrict their scopes either to explanations for changing performance criteria or to endorsements of maximizing on a specific, defined set of static criteria. Performance paradox alone predicts that maximizing on a set of predictably shifting criteria generates positive outcomes for organizations, most accurately describing most actual organizations' behavior.

Maximizing model

Under the maximizing model, managers seek to maximize the firm's long-term value and evaluate performance solely on the firm's share price. The model, like performance paradox, adheres to maximizing on performance measures, but only views one benchmark as legitimate, expecting it never to change. Economists in particular hold to this model, arguing that it holds best in the long run. Thus, to enhance firm performance, executive managers' compensation must be directly tied to stock price performance – as accomplished with year-end bonuses and stock options.

Political model

The "political model" – termed so by Meyer and Gupta because it "operates most openly in government, where a change in regime is followed swiftly by changes in policy and criteria used to assess policy outcomes" (1994, p. 354) – has organizations seeking to maximize a specific set of performance measures. Under this model, though, metrics are defined by the top-level managers' preferences. Performance criteria and the measurement thereof are generally consistent within regimes; however, as the regimes themselves change over from one to the other, so do these measures. So while this model incorporates changing performance measures like performance paradox, because change is associated with some level of disruption and instability, agents view change in performance measures negatively and seek to avoid it. Furthermore, maximizing on performance measures is unlikely to bring about entirely positive outcomes because executive agents generally set goals while engaging in rent-seeking, using power to advance their own interests (whether they be acquiring more power, money etc.).

Constituency model

The constituency model takes the political model's definition of performance and expands it to also include the preferences of the organization's other constituents, including workers and customers. As such, organizations are "assemblages of interest groups whose agendas are in competition, rather than maximizers" of any set of limited, defined criteria; instead, "something approaching social welfare, but not the welfare of any particular group, is maximized under the constituency model".^[1]^: 355 Under this model, as with the political model, strictly following performance measures does not actually provide the best outcome for the organization. In fact, since each measure reflects a particular subgroup's own interests rather than the interests of the organization as a whole, from an organizational standpoint, change in performance criteria would be highly beneficial. Change in performance measures only occurs, though, when either the constituencies themselves or the power structures between these constituencies change, making agents again eschew change because of the associated disruption and instability.

Business model

Under the final of these models, the business model, as with the constituency model, there are several different metrics incorporated in performance measurement. However, whereas the different measures came from different competing factions under the constituency model – meaning that constituents prioritize between fulfilling the various measures and that the measures themselves are at odds with each other – these measures are generally sequential in nature (e.g., ensuring good product quality leads to a rise in the second metric of customer satisfaction, which improves the third measure of financial performance), with each considered as important to be maximized as the others. Again, as with the maximizing model, these measures are not expected to change.

References

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u ^v ^w ^x ^y ^z ^aa ^ab ^ac ^ad ^ae ^af ^ag ^ah ^ai ^aj ^ak ^al ^am ^an ^ao ^ap ^aq Meyer, Marshall W. and Vipin Gupta. 1994. "The Performance Paradox." Research in Organizational Behavior(16): 309-69.
^ ^a ^b Espeland, Wendy Nelson and Michael Sauder. 2007. "Rankings and Reactivity: How Public Measures Recreate Social Worlds." American Journal of Sociology 113(1): 1-40.
^ ^a ^b Kaplan, Robert S. and Norton David P. 1992. "The Balanced Scorecard: Measures that Drive Performance." Harvard Business Review, Jan-Feb: 71-79.
^ Barrett, TH (2005). "Religion and the first recorded print run: Luoyang, July, 855". Bulletin of the School of Oriental and African Studies, University of London. 68 (3): 455–461. doi:10.1017/s0041977x05000261. JSTOR 20181953. S2CID 162776266.
^ Krakovsky, Marina (6 December 2021). "Valuing Bestselling Books". Stanford Business School.

[Meyer_Gupta_1994-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ^r ^s ^t ^u ^v ^w ^x ^y ^z ^aa ^ab ^ac ^ad ^ae ^af ^ag ^ah ^ai ^aj ^ak ^al ^am ^an ^ao ^ap ^aq Meyer, Marshall W. and Vipin Gupta. 1994. "The Performance Paradox." Research in Organizational Behavior(16): 309-69.

[Wendy_Michael_2007-2] Espeland, Wendy Nelson and Michael Sauder. 2007. "Rankings and Reactivity: How Public Measures Recreate Social Worlds." American Journal of Sociology 113(1): 1-40.

[Robert_David_1992-3] Kaplan, Robert S. and Norton David P. 1992. "The Balanced Scorecard: Measures that Drive Performance." Harvard Business Review, Jan-Feb: 71-79.

[4] Barrett, TH (2005). "Religion and the first recorded print run: Luoyang, July, 855". Bulletin of the School of Oriental and African Studies, University of London. 68 (3): 455–461. doi:10.1017/s0041977x05000261. JSTOR 20181953. S2CID 162776266.

[5] Krakovsky, Marina (6 December 2021). "Valuing Bestselling Books". Stanford Business School.

[1]

[2]

[3]

[4]

[5]